-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(optimizer): Implement greedy join reordering #3538
base: main
Are you sure you want to change the base?
Conversation
CodSpeed Performance ReportMerging #3538 will degrade performances by 49.57%Comparing Summary
Benchmarks breakdown
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3538 +/- ##
==========================================
+ Coverage 77.69% 77.72% +0.03%
==========================================
Files 710 714 +4
Lines 86896 87389 +493
==========================================
+ Hits 67513 67923 +410
- Misses 19383 19466 +83
|
} | ||
} | ||
|
||
impl Display for JoinEdge { | ||
pub(crate) struct JoinAdjList( | ||
pub HashMap<LogicalPlanRef, HashMap<LogicalPlanRef, Vec<JoinCondition>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would strong avoid using the LogicalPlanRef
as a hash key here. Whenever you do and insert or a look up the logical plan has to be hashed and then equity has to be computed. I would instead just assign each Node an incrementing ID or use the Arc::ptr
as the ID
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively you can also use something like a
struct GraphNodeKey {
Id: usize,
plan: LogicalPlanRef // ignored for hash and eq calcuations
}
#[derive(Debug)] | ||
struct JoinEdge(JoinNode, JoinNode); | ||
#[derive(Clone, Debug)] | ||
pub(crate) struct JoinCondition { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do these need to pub(crate)
or just local to the file pub(self)
or nothing at all
|
||
impl GreedyJoinOrderer { | ||
/// Consumes the join graph and transforms it into a logical plan with joins reordered. | ||
pub(crate) fn compute_join_order(join_graph: &mut JoinGraph) -> DaftResult<LogicalPlanRef> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is doing too much (reordering and creating the logical plan) and isn't really that modular.
Instead what I would recommend is the following:
Trait JoinReorderer {
fn reorder(&self, &mut graph: JoinGraph)
}
LogicalPlan -> JoinGraphBuilder -> JoinGraph -> JoinReorderer.reorder(JoinGraph) -> JoinGraph -> LogicalPlan
This way you get two things:
- must easier to test that the JoinGraph roundtrip is working correctly
- much easier to plug in a JoinReorderer (just implement that Trait)
Implements the Greedy Operator Ordering algorithm for join reordering.
With
GreedyJoinOrderer
, we can now transform join graphs back into logical plans with joins greedily reordered. Although we ultimately still want to apply DP-based join order algorithms, this greedy algorithm is still necessary when the number of join conditions is large and DP-based algorithms (which run in O(3^n), where n is the number of relations) are too expensive.